On Early Stopping in Gradient Descent Learning

نویسندگان

  • YUAN YAO
  • LORENZO ROSASCO
  • ANDREA CAPONNETTO
چکیده

In this paper, we study a family of gradient descent algorithms to approximate the regression function from Reproducing Kernel Hilbert Spaces (RKHSs), the family being characterized by a polynomial decreasing rate of step sizes (or learning rate). By solving a bias-variance trade-off we obtain an early stopping rule and some probabilistic upper bounds for the convergence of the algorithms. These upper bounds have improved rates where the usual regularized least square algorithm fails and achieve the minimax optimal rate O(m−1/2) in some cases. We also discuss the implication of these results in the context of classification. Some connections are addressed with Boosting, Landweber iterations, and the on-line learning algorithms as stochastic approximations of the gradient descent method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Early Stopping in Gradient Descent Boosting

In this paper, we study a family of gradient descent algorithms to approximate the regression function from reproducing kernel Hilbert spaces. Here early stopping plays a role of regularization, where given a finite sample and some regularity condition on the regression function, a stopping rule is given and some probabilistic upper bounds are obtained for the distance between the function iter...

متن کامل

Geometry of Early Stopping in Linear Networks

A theory of early stopping as applied to linear models is presented. The backpropagation learning algorithm is modeled as gradient descent in continuous time. Given a training set and a validation set, all weight vectors found by early stopping must lie on a certain quadric surface, usually an ellipsoid. Given a training set and a candidate early stopping weight vector, all validation sets have...

متن کامل

Project 1 Report : Logistic Regression

In this project, we study learning the Logistic Regression model by gradient ascent and stochastic gradient ascent. Regularization is used to avoid overfitting. Some practical tricks to improve learning are also explored, such as batch-based gradient ascent, data normalization, grid searching, early stopping, and model averaging. We observe the factors that affect the result, and determine thes...

متن کامل

Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks

Effective training of deep neural networks suffers from two main issues. The first is that the parameter spaces of these models exhibit pathological curvature. Recent methods address this problem by using adaptive preconditioning for Stochastic Gradient Descent (SGD). These methods improve convergence by adapting to the local geometry of parameter space. A second issue is overfitting, which is ...

متن کامل

Early Stopping as Nonparametric Variational Inference

We show that unconverged stochastic gradient descent can be interpreted as a procedure that samples from a nonparametric approximate posterior distribution. This distribution is implicitly defined by the transformation of an initial distribution by a sequence of optimization steps. By tracking the change in entropy over these distributions during optimization, we form a scalable, unbiased estim...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005